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Abstract 

One popular approach to interactively segment the fore¬ 
ground object of interest from an image is to annotate a 
bounding box that covers the foreground object. Then, a 
binary labeling is performed to achieve a refined segmenta¬ 
tion. One major issue of the existing algorithms for such in¬ 
teractive image segmentation is their preference of an input 
bounding box that tightly encloses the foreground object. 
This increases the annotation burden, and prevents these 
algorithms from utilizing automatically detected bounding 
boxes. In this paper, we develop a new LooseCut algorithm 
that can handle cases where the input bounding box only 
loosely covers the foreground object. We propose a new 
Markov Random Fields (MRF) model for segmentation with 
loosely bounded boxes, including a global similarity con¬ 
straint to better distinguish the foreground and background, 
and an additional energy term to encourage consistent la¬ 
beling of similar-appearance pixels. This MRF model is 
then solved by an iterated max-flow algorithm. In the exper¬ 
iments, we evaluate LooseCut in three publicly-available 
image datasets, and compare its performance against sev¬ 
eral state-of-the-art interactive image segmentation algo¬ 
rithms. We also show that LooseCut can be used for en¬ 
hancing the performance of unsupervised video segmenta¬ 
tion and image saliency detection. 

1. Introduction 

Accurately segmenting a foreground object of interest 
from an image with convenient human interactions plays 
a central role in image and video editing. One widely used 
interaction is to annotate a bounding box around the fore¬ 
ground object. On one hand, this input bounding box pro¬ 
vides the spatial location of the foreground. On the other 


hand, based on the image information within and outside 
this bounding box, we can have an initial estimation of 
the appearance models of the foreground and background, 
with which a binary labeling is finally performed to achieve 
a refined segmentation of the foreground and background 
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Figure 1. Sample results from GrabCut and the proposed LooseCut 
with tightly and loosely bounded boxes. 

However, due to the complexity of the object boundary 
and appearance, most of the existing methods of this kind 
prefer the input bounding box to tightly enclose the fore¬ 
ground object. An example is shown in Fig. [I] where the 
widely used GrabCut m algorithm fails when the bound¬ 
ing box does not tightly cover the foreground object. The 
preference of a tight bounding box increases the burden of 
the human interaction, and moreover it prevents these al¬ 
gorithms from utilizing automatically generated bounding 
boxes, such as boxes from object proposals I2I2L 23, that 
are usually not guaranteed to tightly cover the foreground 
object. In this paper, we focus on developing a new Loose- 
Cut algorithm that can accurately segment the foreground 
object with loosely-bounded boxes. 

A loosely bounded box may contain more background 
than a tightly bounded box. As a result, the initial ap- 
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pearance model of the foreground is highly inaccurate by 
using the pixels within the bounding box. This may sub¬ 
stantially reduce the segmentation performance as shown by 
the Grabcut result in Fig. [T] In this paper, we propose two 
strategies to address this problem. First, we explicitly em¬ 
phasize the appearance difference between the foreground 
and background models. Second, we explicitly encourage 
the consistent labeling to the similar-appearance pixels, ei¬ 
ther adjacent or non-adjacent. These two strategies can help 
identify the background pixels within the bounding box, as 
shown in Fig. [2] 


Difference between extracted_Difference between extracted 


foreground and background 


foreground and background O 


...Pa 



"fiiiJ 



By GrabCut 

By LooseCut g 



(a) (b) 

Figure 2. An illustration of the two strategies used in the proposed 
LooseCut algorithm, (a) By emphasizing their appearance differ¬ 
ence, foreground and background are better separated even with 
a loosely bounding box. (b) By encouraging label consistency of 
similar-appearance pixels, background pixel Pb inside the loosely 
bounded box is correctly labeled as background due to its appear¬ 
ance similarity to the background pixel Pa outside the bounding 
box. 

In this paper, we follow GrabCut by formulating the 
foreground/background segmentation as a binary labeling 
over an MRF built upon the image grid, and the appear¬ 
ances of the foreground and background are described by 
two Gaussian Mixture Models (GMMs). More specifically, 
we add a global similarity constraint and a label consis¬ 
tency term to the MRF energy to implement the above 
mentioned two strategies. Finally, we solve the proposed 
MRF model using an iterated max-flow algorithm. In the 
experiments, we evaluate the proposed LooseCut in three 
publicly-available image datasets, and compare its perfor¬ 
mance against several state-of-the-art interactive image seg¬ 
mentation algorithms. We also show that LooseCut can be 
used for enhancing the performance of unsupervised video 
segmentation and image saliency detection. 

The remainder of the paper is organized as follows. Sec¬ 
tion [2] reviews the related work. Section [3] describes the 
proposed LooseCut algorithm in detail. Section [4] reports 
the experimental results, followed by a briefly conclusion 
in Section 21 


2. Related Work 

In recent years, interactive image segmentation based 
on input bounding boxes have drawn much attention in 
the computer vision and graphics community, resulting in 
a number of effective algorithms (15] |T7j [16] [18] [13] [Toll . 
Starting from the classical GrabCut algorithm, many of 
these algorithms use graph cut models: the input image 


is modeled by a graph and the foreground/background seg¬ 
mentation is then modeled by a binary graph cut that min¬ 
imizes a pre-defined energy function (6). In GrabCut (B), 
initial appearance models of the foreground and background 
are estimated using the image information within and out¬ 
side the bounding box. A binary MRF model is then ap¬ 
plied to label each pixel as the foreground or background, 
based on which the appearance models of the foreground 
and background are re-estimated. This process is repeated 
until convergence. As illustrated in Fig. [T] the performance 
of GrabCut is highly dependent on the initial estimation of 
the appearance models of the foreground and background, 
which might be very poor when the input bounding box 
does not tightly cover the foreground object. The LooseCut 
algorithm developed in this paper also follows the general 
procedure introduced in GrabCut, but introduce a new con¬ 
straint and a new energy term to the MRF model to specifi¬ 
cally handle the loosely-bounded boxes. 

PinPoint lfl3l is another MRF-based algorithm for inter¬ 
active image segmentation with a bounding box. It incorpo¬ 
rates a topology prior derived from geometry properties of 
the bounding box and encourages the segmented foreground 
to be tightly enclosed by the bounding box. Therefore, its 
performance gets much worse with a loosely bounded box. 
Also using an MRF model, OneCut ffTTl is recently devel¬ 
oped for interactive image segmentation. Its main contribu¬ 
tion is to incorporate an MRF energy term that reflects the 
appearance overlap between foreground and background 
histograms. As shown in the latter experiments, the L\- 
distance based appearance overlap used in OneCut is still 
insufficient to handle loosely-bounded boxes. In Ubl . a 
pPBC algorithm is developed for interactive image segmen¬ 
tation using an efficient parametric pseudo-bound optimiza¬ 
tion strategy. However, in our experiment shown in Section 
|4j pPBC still cannot give satisfactory segmentation results 
when the input bounding box is loose. 

Other than using the MRF model, MILCut fTSl formu¬ 
lates the interactive image segmentation as a multiple in¬ 
stance learning problem by generating positive bags along 
the sweeping lines within the bounding box. MILCut may 
not generate the desirable positive bags along the sweeping 
lines for a loosely bounded box. Active contour GIB takes 
the input bounding box as an initial contour and iteratively 
deforms it toward the boundary of the foreground object. 
Due to its sensitivity to image noise, active contour usu¬ 
ally requires the initial contour to be close to the underlying 
foreground object boundary. 


3. Proposed Approach 

In this section, we first briefly review the classical Grab- 
Cut algorithm and then explain the proposed LooseCut al¬ 
gorithm. 
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3.1. GrabCut 

GrabCut E) actually performs a binary labeling to each 
pixel using an MRF model. Let X = {xi}™ =1 be the binary 
labels at each pixel i, where X{ = 1 if i is in foreground 
Xi = 0 if i is in background and let 0 = (Mj, Mb) denotes 
the appearance models including foreground GMM Mj and 
background GMM Mb. Grabcut seeks an optimal labeling 
that minimizes 

E GC (X,e) = Y / D(x i ,6)+ ]T V(xi,Xj), (1) 

i ijeAT 

where J\f defines a pixel neighboring system, e.g., 4- 
neighbor or 8-neighbor connectivity. The unary term 
D (xi, 6) measures the cost of labeling pixel i as foreground 
or background based on the appearance models 0. The pair¬ 
wise term V (x {, Xj ) enables the smoothness of the labels 
by penalizing discontinuity among the neighboring pixels 
with different labels. Max-flow algorithm (6) is usually 
used for solving this MRF optimization problem. GrabCut 
takes the following steps to achieve the binary image seg¬ 
mentation with an input bounding box: 


each Gaussian component Mj in the foreground GMM Mf, 
we first find its nearest Gaussian component in Mb as 


j(i)=aig min 


/4 - vl 


(3) 


With this, we can define the similarity between the Gaussian 
component Mj and the entire background GMM M b as 

S(M},M b )= 1 , (4) 




j(i) 

Kb 


which is the inverse of the mean difference between Mj and 
its nearest Gaussian component in the background GMM. 
Then, we define the global similarity function Sim as 


Kf 

Sim(M f ,M b ) = J2S{M},M b ). (5) 


Similar definition for GMM distance could be found in 
1191 . In the MRF minimization, we will enforce the global 
similarity Sim(Mf, Mb) to be low (smaller than a thresh¬ 
old) in the step of estimating 6 and details will be discussed 
in Section [331 


1. Estimating initial appearance models 6, using the pix¬ 
els inside and outside the bounding box respectively. 

2. Based on the current appearance models 0, quantiz¬ 
ing the foreground and background likelihood of each 
pixel and using it to define the unary term D (#*, 6). 
And solve for the optimal labeling that minimizes 
Eq. Q. 

3. Based on the obtained labeling X , refining 6 and go¬ 
ing back to Step 2. Repeating this process until con¬ 
vergence. 


3.4. Label Consistency Term Elc 

To encourage the label consistency of the similar- 
appearance pixels, either adjacent or non-adjacent, we first 
cluster all the image pixels using a recent superpixel algo¬ 
rithm ED that preserves both feature and spatial consis¬ 
tency. Following a /Gmeans-style procedure, this cluster 
algorithm partitions the image into a set of compact super¬ 
pixels and each resulting cluster is made up of one or more 
superpixels. An example is shown in Fig. [3j where the re¬ 
gion color indicates the clusters: superpixels with the same 
color constitute a cluster. 


3.2. MRF Model for LooseCut 

Following the MRF model used in GrabCut, the pro¬ 
posed LooseCut takes the following MRF energy function: 

E(X, 0) = E gc [X , 9) + PE LC (X), (2) 

where E G c is the GrabCut energy given in Eq. ([T]), and 
Elc is an energy term for encouraging label consistency, 
weighted by (3 > 0. In minimizing Eq. we enforce a 
global similarity constraint to better estimate 0 and distin¬ 
guish the foreground and background. In the following, we 
elaborate on the global similarity constraint and the label 
consistency term Elc{X). 

3.3. Global Similarity Constraint 

In this section, we define the proposed global similarity 
constraint. Let Mf have Kj Gaussian components Mj with 
means fij,i = 1, 2, • • • , Kf and Mb have Kb Gaussian 
components M 3 h with means n J b , j = 1, 2, • • • , Kb. For 



Figure 3. An illustration of the superpixel based clusters and label 
consistency. Three clusters are shown in different colors. 

Let Ck indicates the cluster k, and pixels belonging to 
Ck should be encouraged to be given the same label, e.g., 
pi andp 2 in Fig. [3] To accomplish this, we set a cluster label 
xc k (taking values 0 or 1) for each cluster Ck and define the 
label-consistency energy term as 

£lc(X) = ]T ]T ftxi + x Ck ), (6) 

k ieC k 

where </>(•) is an indicator function taking 1 or 0 for true or 
false argument. In the proposed algorithm, we will solve for 
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both the pixel labels and cluster labels simultaneously in the 
MRF optimization. 

3.5. Optimization 

In this section, we propose an algorithm to find the opti¬ 
mal binary labeling that minimizes the energy function de¬ 
fined in Eq. subject to the global similarity constraint. 
Specifically, in each iteration, we first fix the labeling X and 
optimize over 0 by enforcing the global similarity constraint 
on Sim(Mf , Mb). After that, we fix 6 and find an optimal 
X that minimizes E(X, 0). These two steps of optimization 
is repeated alternately until convergence or a preset maxi¬ 
mum number of iterations is reached. As an initialization, 
we use the input bounding box to define a binary labeling X 
in iteration 0. In the following, we elaborate on these two 
optimization steps. 

Fixing X and Optimizing over 0: With fixed binary 
labeling X , we can estimate 0 using a standard EM-based 
clustering algorithm: All the pixels with label 1 are taken 
for computing the foreground GMM Mf and all the pix¬ 
els with label 0 are used for computing the background 
GMM Mb. We intentionally select Kf and Kb such that 
K = Kf — Kb > 0 since some background components 
are mixed to the foreground for the initial X defined by a 
loosely bounded box. For the obtained Mf and Mb, we ex¬ 
amine whether the global similarity constraint is satisfied, 

i.e, Sim(Mf , Mb) < S or not. If this constraint is satisfied, 
we take the resulting 6 and continue to the next step of opti¬ 
mization. If this constraint is not satisfied, we further refine 
Mf using the following algorithm: 

1. Calculate the similarity S(Mf,Mb) between each 
Gaussian component of Mf and Mb, by following 
Eq. ® and identify the K Gaussian components of 
Mf with the largest similarity to Mb. 

2. Among these K components, if any one, say Mj , does 
not satisfy S(Mf , Mb) < S, we delete it from Mf. 

3. After all the deletions, we use the remaining Gaussian 
components to construct an updated M /. 

This algorithm will ensure the updated Mf and Mb satisfies 
the global similarity constraint. 

Fixing 9 and Optimizing over X: Inspired by ifTTI 
and E3, we build an undirect graph with auxiliary nodes 
as shown in Fig. [4] to find an optimal X that minimizes the 
energy E(X, 0). In this graph, each pixel is represented by 
a node. For each pixel cluster Ck, we construct an auxil¬ 
iary node Ak to represent it. Edges are constructed to link 
the auxiliary node Ak and the nodes that represent the pix¬ 
els in Ck, with the edge weight /3 as used in Eq.d2]). An 
example of the constructed graph is shown in Fig.[4jwhere 
pink nodes v\, v$, and vq represent three pixels in a same 
cluster, which is represented by the auxiliary node A\. All 


the nodes in blue represent another cluster. With a fixed 6 , 
we use the max-flow algorithm |6) on this graph to seek an 
optimal X that minimizes the energy E(X, 0). 



Figure 4. Graph construction for the step of optimizing over X 
with a fixed 9. Vi s are the nodes for pixels and Ai s are the aux¬ 
iliary nodes. S and T are the source and sink nodes. Same color 
nodes represent a cluster. 

The graph constructed as in Fig.|4]is similar to the graph 
constructed in OneCut da. However, there are two major 
differences between the proposed algorithm and OneCut. 

1. In OneCut, a color histogram is first constructed for 
the input image and then one auxiliary node is con¬ 
structed for each histogram bin. All the pixels are then 
quantized into these bins and the pixels in each bin are 
then linked to its corresponding auxiliary node. In this 
paper, we use superpixel-based clusters to define the 
auxiliary nodes. 

2. The unary energy term in OneCut is different from the 
one in the proposed method and as a result, we de¬ 
fine the edge weights involving the source and sink 
nodes differently from OneCut. OneCut follows the 
ballooning technique: The weight is set to 1 for the 
edges between S and any pixels inside the bounding 
box, and 0 otherwise; Similarly, the weight is set to 0 
for the edges between T and any pixels in the bound¬ 
ing box, and oc otherwise. In the proposed algorithm, 
the weights of the edges that are incident from S or T 
reflect the unary term in Eq. ([2]), which is based on the 
appearance models 6. 

With these two differences, OneCut seeks to minimize 
the L \-distance based histogram overlap between the fore¬ 
ground and background. This is different from the goal of 
the proposed algorithm: we seek better label consistency of 
the pixels in the same cluster by using this graph structure. 
We will compare with OneCut in the latter experiments. 
The full LooseCut algorithm is summarized in Algorithm 

m 

4. Experiments 

To justify the proposed LooseCut algorithm, we con¬ 
duct experiments on three widely used image datasets - 
the GrabCut dataset El, the Weizmann dataset e a, 
and the iCoseg dataset m, and compare its performance 
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Algorithm 1 LooseCut 

Input: Image /, bounding box B , # of clusters N 
Output: Binary labeling X to pixels in I 

l: Construct N superpixel based clusters using (2ll . 

2: Create initial labeling X using box B. 

3: repeat 

4: Based on the current labeling X , estimate and update 

6 by enforcing Sim(Mf , Mb) < S. 

5: Construct the graph using the updated 6 with N aux¬ 

iliary nodes as shown in Fig. [4] 

6: Apply the max-flow algorithm [63 to update labeling 

X by minimizing E(X,6). 

7: until Convergence or maximum iterations reached 


against several state-of-the-art interactive image segmenta¬ 
tion methods, including GrabCut (HI, OneCut (171 . MIL- 
Cut (H, and pPBC (H. We also conduct experiments to 
show the effectiveness of LooseCut in two applications: un¬ 
supervised video segmentation and image saliency detec¬ 
tion. 


Metrics: As in lfl8l (171 (T3t we use Error Rate to eval¬ 
uate an interactive image segmentation by counting the per¬ 
centage of misclassified pixels inside the bounding box. We 
also take the pixel-wise F-measure as an evaluation metric, 
which combines the precision and recall metrics in terms of 
the ground-truth segmentation. 


Parameter Settings: For the number of Gaussian com¬ 
ponents in GMMs, Kb is set to 5 and Kf is set to 6. As dis- 
K = Kf — K b = 1. To enforce the 


cussed in Section 3.5 


global similarity constraint, we delete K — 1 component in 
Mf . The number of clusters (auxiliary nodes in graph) is set 
to N = 16. For the LooseCut energy defined in Eq. ([2]), we 
consistently set (3 = 0.01. The unary term and binary term 
in Eq. ^ are the same as in (131 and RGB color features 
are used to construct the GMMs. We set S = 0.02 in delet¬ 
ing the foreground GMM component to enforce the global 
similarity constraint. For all the comparison methods, we 
follow their default or recommended settings in their codes. 


L = 0% L = 240% L = 600% 



Figure 5. Bounding boxes with different looseness. From left to 
right: image, ground-truth foreground, baseline bounding box and 
a series of bounding boxes with increased looseness. 


4.1. Interactive Image Segmentation 

In this experiment, we construct bounding boxes with 
different looseness and examine the resulting segmenta¬ 
tion. As illustrated in Fig. [5j we compute the fit box to the 
ground-truth foreground and slightly dilate it by 10 pixels 
along four directions, i.e., left, right, up, and down. We take 
it as the baseline bounding box with 0% looseness. We then 
keep dilating this bounding box uniformly along all four di¬ 
rections to generate a series of looser bounding boxes - a 
box with a looseness L (in percentage) indicates its area in¬ 
crease by L against the baseline bounding box. A bounding 
box will be cropped when any of its sides reaches the image 
perimeter. An example is shown in Fig. [5] 

GrabCut dataset m consists of 50 images. Nine of 
them contain multiple objects while the ground truth is only 
annotated on a single object, e.g., ground truth only label 
one person but there are two people in the loosely bounded 
box. Such images are not applicable to test performance 
change when we enlarge the box looseness. Therefore, we 
use the remaining 41 images in our experiments. From 
Weizmann dataset SIS, we pick a subset of 45 images for 
testing, by discarding the images where the baseline bound¬ 
ing box has almost cover the full image and cannot be di¬ 
lated to construct looser bounding boxes. For the similar 
reason, from iCoseg dataset (4), we select a subset of 45 
images for our experiment. 

Experimental results on these three datasets are summa¬ 
rized in Fig. [6] In general, the segmentation performance 
degrades when the bounding-box looseness increases for 
both the proposed LooseCut and all the comparison meth¬ 
ods. However, LooseCut shows a slower performance 
degradation than the comparison methods. When the loose¬ 
ness is high, e.g., L = 300% or L = 600%, LooseCut 
shows much higher F-measure and much lower Error Rate 
than all the comparison methods. Since MILCut’s code is 
not publicly available, we only report MILCut’s F-measure 
and Error Rate values with the baseline bounding boxes on 
the GrabCut dataset and the Weizmann dataset by copy¬ 
ing it from the original paper. Table [I] reports the values 
of F-measure and Error Rate of segmentation with varying- 
looseness bounding boxes on GrabCut dataset. Sample seg¬ 
mentation results, together with the input bounding boxes 
with different looseness, are shown in Fig. [4] 

4.2. Unsupervised Video Segmentation 

The goal of unsupervised video segmentation is to au¬ 
tomatically segment the objects of interest from each video 
frame. The segmented objects can then be associated across 
frames to infer the motion and action of these objects. It 
is important for video analysis and semantic understanding 
ED One popular approach for unsupervised video segmen¬ 
tation is to detect a set of object proposals, in the form of 
bounding boxes lH2l . from each frame and then extract the 
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Figure 6. Interactive image segmentation performance (top: F-measure; bottom: Error Rate) on three widely used datasets. 


Methods 

L = 

0% 

L = 120% 

L = 240% 

L = 600% 

F-measure 

Error Rate 

F-measure 

Error Rate 

F-measure 

Error Rate 

F-measure 

Error Rate 

GrabCut 

0.916 

7.4 

0.858 

10.1 

0.816 

12.6 

0.788 

13.7 

OneCut 

0.923 

6.6 

0.853 

8.7 

0.785 

9.9 

0.706 

13.7 

pPBC 

0.910 

7.5 

0.844 

9.1 

0.827 

9.4 

0.783 

12.3 

MIFCut 

- 

3.6 

- 

- 

- 

- 

- 

- 

FooseCut 

0.882 

7.9 

0.867 

5.8 

0.844 

6.9 

0.826 

6.8 


Table 1. Segmentation performance on GrabCut dataset with bounding boxes of different looseness. 


objects of interest from these proposals (20). 

In practice, a detected proposal may only cover part of 
the object of interest, so we detect a set of object proposals 
and merge them together to construct a large mask, which 
has a better chance to cover the whole object. Clearly, this 
merged mask may only loosely bound the object of inter¬ 
est and the object could be extracted by mask based seg¬ 
mentation algorithms. Specifically, we apply a recent Fu- 
sionEdgeBox algorithm (22) to detect top 10 object propos¬ 
als in each video frame for the merged mask. 

This experiment is conducted on a subset (21 videos, 
657 frames) of JHMDB video dataset (9). Table [2] shows 
the unsupervised video segmentation performance, in terms 
of F-measure and Error Rate averaged over all the frames. 
We can see that the proposed FooseCut substantially out¬ 
performs GrabCut, OneCut and pPBC in this task. Sample 
video segmentation results are shown in Fig. [5] 


Methods 

F-measure 

Error Rate 

FusionEdgeBox Mask 

0.35 

77.0 

GrabCut 

0.55 

30.5 

OneCut 

0.58 

25.1 

pPBC 

0.54 

31.6 

LooseCut 

0.64 

17.0 


Table 2. Unsupervised video segmentation performance. 


4.3. Image Saliency Detection 

Recently, GrabCut has been used to detect the salient 
area from an image fl4) . As illustrated in Fig. [9j a set of 
pre-defined bounding boxes are overlaid to the input image 
and with each bounding box, GrabCut is applied for a fore¬ 
ground segmentation. The probabilistic saliency map is fi¬ 
nally constructed by combining all the foreground segmen¬ 
tations. In this experiment, it is clear that many pre-defined 
bounding boxes are not tight. 

In this experiment, out of 1000 images in the Salient Ob¬ 
ject dataset Q, we randomly select 100 images for test¬ 
ing. 15 pre-defined masks are shown in Fig. [9] For quan¬ 
titative evaluation, we follow m to binarize a resulting 
saliency map using an adaptive threshold (two times the 
mean saliency of the map). Table [3]reports the precision, re¬ 
call and F-measure of saliency detection when using Grab- 
Cut, OneCut, pPBC, and FooseCut for foreground segmen¬ 
tation. We also include comparisons of two state-of-the- 
art saliency detection methods that do not use pre-defined 
masks, namely FT □ and RC |7). Sample saliency detec¬ 
tion results are shown in Fig. [TO) 

We can see that FooseCut outperforms GrabCut, OneCut 
and pPBC in this task. It also outperforms FT which does 
not use bounding-box based segmentation. RC £3 achieves 
the best performance for saliency detection, because it com¬ 
bines more complex saliency cues than segmentation based 
approach. 
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Figure 7. Sample results for interactive image segmentation. 
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Figure 8. Sample video segmentation. From left to right: top 10 detected object proposals (red rectangles), merged mask and different 
segmentation results. 
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Figure 9. Segmentation based saliency detection. 


Saliency map 


Methods 

GrabCut Dataset 

Weizman Dataset 

iCoseg Dataset 

F-measure 

Error Rate 

F-measure 

Error Rate 

F-measure 

Error Rate 

LooseCut w/o proposed constraint & term 

0.788 

13.7 

0.688 

19.4 

0.686 

15.0 

LooseCut w/o global similarity constraint 

0.801 

12.0 

0.709 

17.9 

0.691 

14.8 

LooseCut w/o label consistency term 

0.822 

7.3 

0.836 

7.4 

0.806 

6.3 

LooseCut 

0.826 

6.8 

0.841 

6.6 

0.808 

6.1 


Table 4. The usefulness of the proposed global similarity constraint and the label consistency term in LooseCut. 


7 

























































Image FT |T) RC 0 GrabCut OneCut pPBC LooseCut 

Figure 10. Sample saliency detection results. 


Methods 

Precision 

Recall 

F-measure 

FT (T] 

0.75 

0.57 

0.61 

rc (3 

0.86 

0.85 

0.84 

GrabCut 

0.85 

0.61 

0.67 

OneCut 

0.86 

0.76 

0.77 

pPBC 

0.84 

0.66 

0.69 

LooseCut 

0.84 

0.78 

0.78 


Table 3. Performance of saliency detection. 


4.4. Additional Results 

In this section, we report additional results that justify 
the usefulness of the global similarity constraint and the la¬ 
bel consistency term, the running time of the proposed al¬ 
gorithm and possible failure cases. 

We run experiments on the three image segmentation 
datasets when L = 600% by removing the global simi¬ 
larity constraint and/or the label consistency term, together 
with their corresponding optimization steps in the pro¬ 
posed LooseCut algorithm. The quantitative performance is 
shown in Table [4] We can see that both the global similarity 
constraint and the label consistency term help improve the 
segmentation performance. The global similarity constraint 
helps improve the segmentation performance more signifi¬ 
cantly than the label consistency term. 

For the running time, we test LooseCut and all the com¬ 
parison methods on a PC with Intel 3.3GHz CPU and 4GB 
RAM. We compares their running time for different image 
size. In this experiment, OneCut only has one iteration, and 
the iterations of GrabCut and LooseCut are stopped until 


convergence or a maximum 10 iterations is reached. As 
shown in Table |5j if the image size is less than 512 x 512, 
the running time of three algorithms are very close. For 
large images, LooseCut and OneCut takes more time than 
GrabCut. In general, LooseCut still shows reasonable run¬ 
ning time. Our current LooseCut code is implemented in 
Matlab and C++, and it can be substantially optimized for 
speed. 


Methods 

64*64 

128*128 

256*256 

512*512 

1024*1024 

GrabCut 

0.16 

0.28 

1.47 

3.81 

25.21 

OneCut 

0.03 

0.09 

0.49 

5.72 

77.80 

pPBC 

0.14 

0.37 

2.70 

26.14 

305.60 

LooseCut 

0.32 

0.43 

1.68 

7.63 

66.52 


Table 5. Running time (in seconds) with increasing image size. 


Due to the proposed global similarity constraint and la¬ 
bel consistency term, LooseCut may fail when the fore¬ 
ground and background show highly similar appearances, 
as shown in Fig. ED 



Figure 11. Failure cases of LooseCut. 
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5. Conclusion 

This paper proposed a new LooseCut algorithm for in¬ 
teractive image segmentation by taking a loosely bounded 
box. We further introduced a global similarity constraint 
and a label consistency term into MRF model. We devel¬ 
oped an iterative algorithm to solve the new MRF model. 
Experiments on three image segmentation datasets showed 
the effectiveness of LooseCut against several state-of-the- 
art algorithms. We also showed that LooseCut can be used 
to enhance the important applications of unsupervised video 
segmentation and image saliency detection. 
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